Parsing of Grammatical Relations for Databases of Spoken Language
نویسنده
چکیده
Despite the significant advances in syntactic parsing of written text, the application of these techniques to spontaneous spoken language has received more limited attention. The explosive growth of available corpora of transcribed spoken language opens up new opportunities in that direction. High accuracy parsers for spoken language will in turn provide a platform for development of a wide range of applications, as well as for advanced research on the nature of conversational interactions. One concrete field that is ripe for the application of such parsing tools is the study of child language acquisition. In this proposal, I describe a plan for developing a new approach for analyzing the syntactic structure of spontaneous conversational language in parent-child interactions. Specific emphasis is placed on the challenge of accurately annotating the English corpora in the CHILDES database with grammatical relations that are of particular interest and utility to researchers in child language acquisition. This work will involve rule-based and corpus-based natural language processing techniques, as well as a methodology for combining results from different approaches into a high-performance system. One practical application of this research is the automation of language competence measures used by clinicians and researchers of child language development. I will implement an automatic version of one such measurement scheme, providing not only a useful tool for the child language research community, but also a task-based evaluation framework for grammatical relation identification.
منابع مشابه
Incremental Dependency Parsing and Disfluency Detection in Spoken Learner English
This paper investigates the suitability of state-of-the-art natural language processing (NLP) tools for parsing the spoken language of second language learners of English. The task of parsing spoken learner-language is important to the domains of automated language assessment (ALA) and computer-assisted language learning (CALL). Due to the non-canonical nature of spoken language (containing fil...
متن کاملCombining Rule-based and Data-driven Techniques for Grammatical Relation Extraction in Spoken Language
We investigate an aspect of the relationship between parsing and corpus-based methods in NLP that has received relatively little attention: coverage augmentation in rule-based parsers. In the specific task of determining grammatical relations (such as subjects and objects) in transcribed spoken language, we show that a combination of rule-based and corpus-based approaches, where a rule-based sy...
متن کاملA Multi-Strategy Approach for Parsing of Grammatical Relations in Transcripts of Parent-Child Dialogs
Automatic analysis of syntax is one of the core problems in natural language processing. Despite significant advances in syntactic parsing of written text, the application of these techniques to spontaneous spoken language has received more limited attention. The recent explosive growth of online, accessible corpora of spoken language interactions opens up new opportunities for the development ...
متن کاملParsing of Grammatical Relations in Transcripts of Parent-Child Dialogs Thesis Summary
Automatic analysis of syntax is one of the core problems in natural language processing. Despite significant advances in syntactic parsing of written text, the application of these techniques to spontaneous spoken language has received more limited attention. The recent explosive growth of online, accessible corpora of spoken language interactions opens up new opportunities for the development ...
متن کاملRobust grammatical analysis for spoken dialogue systems
We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of information. We discuss test results suggesting that grammatical processing allows fast and accurate pr...
متن کامل